Search CORE

50 research outputs found

A corpus-based semantic kernel for text classification by using meaning values of terms

Author: Altınel Berna
Diri Banu
Ganiz Murat Can
Publication venue: 'Elsevier BV'
Publication date: 01/08/2015
Field of study

Text categorization plays a crucial role in both academic and commercial platforms due to the growing demand for automatic organization of documents. Kernel-based classification algorithms such as Support Vector Machines (SVM) have become highly popular in the task of text mining. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. Recently, there is an increased interest in the exploitation of background knowledge such as ontologies and corpus-based statistical knowledge in text categorization. It has been shown that, by replacing the standard kernel functions such as linear kernel with customized kernel functions which take advantage of this background knowledge, it is possible to increase the performance of SVM in the text classification domain. Based on this, we propose a novel semantic smoothing kernel for SVM. The suggested approach is based on a meaning measure, which calculates the meaningfulness of the terms in the context of classes. The documents vectors are smoothed based on these meaning values of the terms in the context of classes. Since we efficiently make use of the class information in the smoothing process, it can be considered a supervised smoothing kernel. The meaning measure is based on the Helmholtz principle from Gestalt theory and has previously been applied to several text mining applications such as document summarization and feature extraction. However, to the best of our knowledge, ours is the first study to use meaning measure in a supervised setting to build a semantic kernel for SVM. We evaluated the proposed approach by conducting a large number of experiments on well-known textual datasets and present results with respect to different experimental conditions. We compare our results with traditional kernels used in SVM such as linear kernel as well as with several corpus-based semantic kernels. Our results show that classification performance of the proposed approach outperforms other kernels

Dogus University Institutional Repository

A fault detection strategy for software projects

Author: Banu Diri
Cagatay Catal
Publication venue: Faculty of Mechanical Engineering in Slavonski Brod; Faculty of Electrical Engineering, Computer Science and Information Technology Osijek; Faculty of Civil Engineering in Osijek
Publication date: 01/01/2013
Field of study

Postojeći modeli predviđanja pogrešaka softvera zahtijevaju metrike i podatke o pogreškama koji pripadaju prethodnim verzijama softvera ili sličnim projektima softvera. Međutim, postoje slučajevi kada prethodni podaci o pogreškama nisu prisutni, kao što je prelazak softverske tvrtke u novo projektno područje. U takvim situacijama, nadzorne metode učenja pomoću označavanja pogreške se ne mogu primijeniti, što dovodi do potrebe za novim tehnikama. Mi smo predložili strategiju predviđanja pogrešaka softvera uporabom razinske metode mjernih pragova za predviđanje sklonosti pogreškama neoznačenih programskih modula. Ova tehnika je eksperimentalno ocijenjena na NASA setovima podataka, KC2 i JM1. Neki postojeći pristupi primjenjuju nekoliko klasterskih tehnika kazetnog modula, proces popraćen fazom procjene. Ovu procjenu obavlja stručnjak za kvalitetu softvera, koji analizira svakog predstavnika pojedinog klastera, a zatim označava module kao pogreški-naklonjene ili pogreški-nenaklonjene. Naš pristup ne zahtijeva čovjeka kao stručnjaka tijekom predviđanja procesa. To je strategija predviđanja pogreške, koja kombinira razinsku metodu mjernih pragova kao mehanizma za filtriranje i ILI operatora kao sastavni mehanizam.The existing software fault prediction models require metrics and fault data belonging to previous software versions or similar software projects. However, there are cases when previous fault data are not present, such as a software company’s transition to a new project domain. In this kind of situations, supervised learning methods using fault labels cannot be applied, leading to the need for new techniques. We proposed a software fault prediction strategy using method-level metrics thresholds to predict the fault-proneness of unlabelled program modules. This technique was experimentally evaluated on NASA datasets, KC2 and JM1. Some existing approaches implement several clustering techniques to cluster modules, process followed by an evaluation phase. This evaluation is performed by a software quality expert, who analyses every representative of each cluster and then labels the modules as fault-prone or not fault-prone. Our approach does not require a human expert during the prediction process. It is a fault prediction strategy, which combines a method-level metrics thresholds as filtering mechanism and an OR operator as a composition mechanism

HRČAK - Portal of Croatian Scientific and Professional Journals

A fault detection strategy for software projects

Author: Banu Diri
Cagatay Catal
Publication venue: Faculty of Mechanical Engineering in Slavonski Brod; Faculty of Electrical Engineering, Computer Science and Information Technology Osijek; Faculty of Civil Engineering in Osijek
Publication date: 01/01/2013
Field of study

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Estimation of the proteomic cancer co-expression sub networks by using association estimators

Author: Diri Banu
Erdoğan Cihat
Kurt Zeyneb
Yamanishi Yoshihiro
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/11/2017
Field of study

Northumbria Research Link

MORPHOLOGY BASED TEXT COMPRESSION

Author: DİRİ Banu
GÖKSU Hayriye
Publication venue: Dokuz Eylül Üniversitesi Mühendislik Fakültesi
Publication date: 01/01/2010
Field of study

İnternetâ€Ÿin yaygınlaşmasıyla sayısal ortamdaki doküman sayısı gittikçe artmakta ve bu bilgiye daha kolay ve hızlı bir şekilde erişme isteği doküman sıkıştırmayı önemli hale getirmektedir. Doküman sıkıştırma alanında yapılan çalışmaların bir kısmı, dilin biçim bilimsel yapısını kullanmayı amaçlayan çalışmalardır. Bu çalışmada, Türkçe ve İngilizce dokümanların sıkıştırılma verimlerinin belirlenmesinde dilin biçim bilimsel yapısı kullanılarak 10 farklı ayrıştırma yöntemi uygulanmış ve bu yöntemlerin sıkıştırma başarısına olan etkileri karşılaştırmalı olarak verilmiştir. With the rapid growth of online information, the number of documents in digital media is very common increased and access request to this information easier and quickly makes important the document compression. A part of studies on the document compression, the morphological structure of the language used is intended to work. In this study, Turkish and English language documents to determine the compression efficiency by using the morphological structure of 10 different decomposition methods applied and the effect on the compression success of this method are given in comparison

Dokuz Eylul University Research Information System

DSpace at Dokuz Eylul University

A Probabilistic Multi-Objective Artificial Bee Colony Algorithm for Gene Selection

Author: Banu Diri
Bulent Bolat
Zeynep Ozger
Publication venue: 'Verlag der Technischen Universitat Graz'
Publication date: 01/04/2019
Field of study

Microarray technology is widely used to report gene expression data. The inclusion of many features and few samples is one of the characteristic features of this platform. In order to define significant genes for a particular disease, the problem of high-dimensionality microarray data should be overcome. The Artificial Bee Colony (ABC) Algorithm is a successful meta-heuristic algorithm that solves optimization problems effectively. In this paper, we propose a hybrid gene selection method for discriminatively selecting genes. We propose a new probabilistic binary Artificial Bee Colony Algorithm, namely PrBABC, that is hybridized with three different filter methods. The proposed method is applied to nine microarray datasets in order to detect distinctive genes for classifying cancer data. Results are compared with other wellknown meta-heuristic algorithms: Binary Differential Evolution Algorithm (BinDE), Binary Particle Swarm Optimization Algorithm (BinPSO), and Genetic Algorithm (GA), as well as with other methods in the literature. Experimental results show that the probabilistic self-adaptive learning strategy integrated into the employed-bee phase can boost classification accuracy with a minimal number of genes

Directory of Open Access Journals

Behcet's disease and renal failure

Author: Akpolat Tekin
Dilek Melda
Diri Banu
Oǧuz Yusuf
Yılmaz Emine Demirel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/05/2003
Field of study

Background. The aims of this study were (i) to investigate the prevalence of Behcet's disease (BD) among dialysis patients in Turkey, (ii) to report the clinical characteristics of patients with BD and endstage renal disease (ESRD), (iii) to evaluate the effect of ESRD on course and activity of BD and (iv) to analyse the published data about BD and renal failure. Methods. A questionnaire investigating BD among dialysis patients was submitted to 350 dialysis centres and we obtained the data for 20 596 patients from 331 dialysis centres. We submitted a second questionnaire regarding clinical characteristics of the patients with BD and ESRD. The PubMed and Web of Science databases were used for the analysis of BD and renal failure. Results. Fourteen patients with BD were determined and the prevalence of BD was 0.07% among 20 596 dialysis patients in Turkey. None of the patients has had a new manifestation of BD after initiation of haemodialysis treatment. The analysis of previous data about renal BD demonstrated 67 patients with renal failure. Conclusions. The most common cause of renal failure in BD is amyloidosis. Routine urine analysis and measurement of serum creatinine and blood urea nitrogen levels are needed for early diagnosis. Vascular access-related problems are common and the activity of BD appears to decrease in patients with ESRD after initiation of haemodialysis

Açık Erişim@BUU